Systems and Methods for Machine Learning Enhanced Image Registration

ABSTRACT

Devices, methods, and program storage devices for training and leveraging machine learning (ML) models to use in image registration, especially on unaligned multispectral images, are disclosed, comprising: obtaining aligned multispectral image data; generating a first plurality of feature descriptors for features identified in the aligned multispectral image data; generating a training set of feature descriptor pairs based on the first plurality of feature descriptors; and training a ML model based on the training set of feature descriptor pairs, wherein the trained ML model is configured to determine matches between features in unaligned multispectral image data. The techniques may then: obtain unaligned multispectral image data; generate a second plurality of feature descriptors for features identified in the unaligned multispectral image data; and use the trained ML model to determine matches between features in the second plurality of feature descriptors, which matches may be used in performing image registration and/or fusion operations.

TECHNICAL FIELD

This disclosure relates generally to the field of digital imageprocessing. More particularly, but not by way of limitation, it relatesto techniques for training and developing machine learning (ML) modelsto aid in the performance of image registration of digital images,especially digital images comprising multispectral image data.

BACKGROUND

Image registration is the process of warping (or otherwise manipulating)an input image so that it can be overlaid on another image (sometimesreferred to as a “reference image”), such that the respective contentsof the two images align with each other as well as possible after theimage registration operation has been performed. There are variousreasons why image registration operations may be performed on capturedimages. For example, multiple image capture devices that are unaligned(e.g., image sensors that are mounted next to each other, such as in avertical or horizontal orientation, on one side of a single electronicdevice) may be used to capture concurrent images of the same scene,which images may need to be registered with one another before furtherimage processing operations may be performed on said images.Alternatively, it may be desirable to stitch together multiple images ofdifferent parts of the same scene that are captured by a single imagecapture device held at different poses and/or moved over a given timeinterval (e.g., in the case of a user capturing a panoramic image). Inmachine vision applications, stereo cameras may also be used to providedepth of field (and other advantages) which, in turn, may require thatthe images from the stereo cameras are properly registered beforefurther analysis may be performed.

An “image feature,” as used herein, preferably contains twocomponents: 1) an image coordinate; and 2) a feature descriptor, whichmay be algorithmically-derived from the image content in an areasurrounding the feature's image coordinate. A typical method to registerimages may involve identifying matching features in both the input imageand the reference image and then calculating a mapping function (e.g., awarping function, or other set of equations and/or parameters describinga desired transformation), such that the coordinates of the set offeatures in the input image are transformed, via the calculated mappingfunction, to be as close as possible to the coordinates of therespective matching features in the reference image. If enough matchingfeatures are found between the images being registered (and suchfeatures are sufficiently spatially-distributed in the images), then themapping function can be applied to each pixel in the input image (i.e.,not merely the pixels at the coordinates of the identified matchingfeatures) to warp the input image so that it can be overlaid on thereference image. Different methods for generating mapping functionsexist, e.g., depending on what changes are expected between the twoimages. Typical changes that can be accounted for by standard mappingfunctions include: pan, tilt, roll, translation, and/or zoom of theimage capture devices. However, differences in spectral sensitivitybetween the image capture devices capturing the two images that are tobe registered may result in an inability for traditional image featurematching approaches to be able to successfully match features betweenthe two images, as will be discussed further herein.

As may be appreciated, identifying a good set of matching featuresbetween the images being registered is important in determining asuccessful mapping function. These features can be determined manuallyor automatically—although manual identification would be unlikely towork in many real-time or remote applications. The automatic featureidentification methods may work by algorithmically analyzing image dataand producing so-called “feature descriptors” for each identifiedfeature, to go along with the feature's respective image coordinates.Image features preferably reflect portions of an image that may be foundautomatically and accurately between images being registered—even ifthere are minor changes in scale or perspective between the images beingregistered. These features typically comprise either high contrastcorners, edges, or other local extrema in intensity. In order todifferentiate one image feature from another, a description of thefeature, i.e., the aforementioned feature descriptor, is generated toallow the same point to be recognized if it appears in another image ofthe scene.

As mentioned above, these feature descriptors are also preferablyinvariant to changes in perspective and scale between images. Examplesof image feature descriptors include: the Scale Invariant FeatureTracker (SIFT) feature, the Speeded-up Robust Features (SURF) feature,the Orientated FAST and Rotated BRIEF (ORB) feature, the Binary RobustInvariant Scale Keypoints (BRISK) feature, the Binary Features fromRobust Orientation Segment Tests (BFROST) feature, and many others.

Image registration is a relatively mature field and is often used inrobotics, autonomous vehicles, and other systems where machine vision isa key aspect. Image registration is also used in more conventionalapplications, such as image editing (e.g., when attempting to add imagefeatures from one image to another image at the corresponding locationin the other image). However, there are still opportunities forimprovement, such as the aforementioned multispectral image datascenario, wherein the image capture devices capturing the images thatare to be registered have a difference in spectral sensitivity (e.g.,where one image capture device is predominantly sensitive to visiblelight wavelength ranges, and the other image capture device ispredominantly sensitive to non-visible light wavelength ranges). Thus,it would be desirable to have methods and systems that provide forimproved image registration, especially in instances of unalignedmultispectral image data that is difficult for traditional imageregistration techniques to handle, e.g., by leveraging machine learningmodels built based on features detected from large training sets ofaligned multispectral image data.

SUMMARY

Image sensors capable of capturing aligned multispectral image data,e.g., including image data in both visible wavelength ranges andnon-visible wavelength ranges (such as infrared (IR)), offer theopportunity to increase the amount of detail and context achievable in acaptured image, e.g., by leveraging the accuracy of structuralcorrespondences between the aligned captured images. This means that,even though the visible light signal strength may be very low in a givenscene, the multispectral signal strength (e.g., which includesnon-visible light signal information) may still be appreciably above thelevel of noise, and may thus be used to aid in the performance ofvarious image processing operations, such as image fusion, to generatean enhanced output image—assuming that proper correspondences may bedetermined between the image data in the visible wavelength ranges andthe image data in the non-visible wavelength ranges.

As alluded to above, image sensors capable of capturing alignedmultispectral image data may advantageously provide for a common pixelcoordinate system between the image data in both the visible ranges andnon-visible wavelength ranges. In other words, because the capturedimage data is pre-aligned (e.g., via being captured by a singlemultispectral imaging sensor), it may be assumed that two featuresappearing at the same image coordinate in the both the visible imagedata and the non-visible image data are indeed reflective of the sameimage feature in the scene—even if the respective feature descriptorsfor such features are not similar to one another (which may often be thecase when comparing feature descriptors across different spectra).

Embodiments described herein may thus leverage machine learning systemsand methods to better match these multispectral features across images.This may be done by creating a training dataset of pairs of featuredescriptors, together with a strength value indicating whether each pairis likely a strong match or not. The feature descriptor dataset may becreated by running standard image feature algorithms on the image datafrom each spectrum of a set of aligned, i.e., preregistered,multispectral images and using the image coordinates of such identifiedfeatures (as opposed to the similarity of the features' respectivefeature descriptors) to determine which image feature pairs (and, thus,which image feature descriptors) match each other. Training a ML-basedmodel on aligned multispectral image data allows for the later use ofsuch models to match features across spectra on images that are notaligned, i.e., do not share common pixel coordinates, and/or featuresthat may not have similar feature descriptors (e.g., a feature thatlooks very different in the visible spectrum than it does in thenon-visible spectrum, yet still represents the same feature in thescene). Using multispectral image data may be particularly advantageousin generating high quality output images, especially if the image datacaptured in each spectrum is of a similar quality level (e.g., in termsof sharpness, contrast, color reproduction, and/or resolution), becausethe additional spectra may enable more detail and/or context from ascene to be realized.

Thus, devices, methods, and non-transitory program storage devices(NPSDs) are disclosed herein to provide for ML-enhanced feature matchingmodel creation leveraging multispectral image data that have improvedaccuracy for performing image registration on multispectral image data,especially unaligned multispectral image data, e.g., image data comingfrom two spatially-distinct image capture devices with differentspectral sensitivities.

According to some embodiments, there is provided a method for imageprocessing, comprising: obtaining aligned multispectral image data;generating a first plurality of feature descriptors for featuresidentified in the aligned multispectral image data; generating atraining set of feature descriptor pairs based on the generated firstplurality of feature descriptors; and training a ML model based on thegenerated training set of feature descriptor pairs, wherein the trainedML model is configured to determine matches between features inunaligned multispectral image data. The aligned multispectral image datamay, e.g., comprise a first plurality of aligned multispectral images,wherein each aligned multispectral image comprises at least a firstportion of image data in a first spectrum and at least a second portionof image data in a second spectrum, wherein the first spectrum andsecond spectrum are in at least partially non-overlapping wavelengthranges, and wherein the first portion and second portion are aligned.

In some embodiments, the first plurality of aligned multispectral imagesmay comprise sets of images having aligned visible image data (e.g.,red-green-blue (RGB) data, RGB+Depth (RGB-D) data, YUV data, grayscaleluminance data, etc.) and non-visible image data (e.g., IR data, such asNear IR (NIR) data or Long Wavelength IR (LWIR) data, or evenUltraviolet (UV) data, etc.). In some embodiments, the first pluralityof aligned multispectral images comprise images captured by the sameimage capture device.

In some embodiments, the first plurality of feature descriptors for thealigned multispectral image data comprises: a first set of featuredescriptors for features in the first portion of image data in the firstspectrum; and a second set of feature descriptors for features in thesecond portion of image data in the second spectrum. In some cases, eachof the feature descriptor pairs in the generated training set comprisesa match between a feature descriptor from the first set and a featuredescriptor from the second set. In some cases, each of the featuredescriptor pairs in the generated training set further comprises astrength value, e.g., a strength value that is based, at least in part,on a distance between: a location within a given image of the firstplurality of aligned multispectral images of a respective featurerepresented in the first set of feature descriptors; and a locationwithin the given image of the respective feature represented in thesecond set of feature descriptors. For example, smaller distancesbetween the locations of matching features within a given featuredescriptor pair may correlate to a larger strength value being assignedto the given feature descriptor pair, while larger distances between thelocations of matching features within a given feature descriptor pairmay correlate to a smaller strength value being assigned to the givenfeature descriptor pair.

According to still other embodiments, the method for image processingmay further include steps to use the trained ML model to determinematches between features in unaligned multispectral image data, themethod further comprising: obtaining unaligned multispectral image data;generating a second plurality of feature descriptors for featuresidentified in the unaligned multispectral image data; and using thetrained ML model to determine matches between features represented inthe second plurality of feature descriptors for the unalignedmultispectral image data. In some cases, the method may further performan image registration operation on the unaligned multispectral imagedata based, at least in part, on the determined matches between featuresrepresented in the second plurality of feature descriptors for theunaligned multispectral image data, to generate aligned multispectralimage data. In some cases, the method may still further perform a fusionoperation (or other desired image processing operation) on the alignedmultispectral image data to generate an enhanced output image. In somescenarios, the unaligned multispectral image data may comprise one ormore images comprising image data captured by two or morespatially-distinct image capture devices, e.g., two separatesingle-spectrum cameras mounted at different locations on an electronicdevice.

Various NPSD embodiments are also disclosed herein. Such NPSDs arereadable by one or more processors. Instructions may be stored on theNPSDs for causing the one or more processors to perform any of theML-enhanced feature matching model creation and use techniques formultispectral image data disclosed herein.

Various programmable electronic devices are also disclosed herein, inaccordance with the NPSD and method embodiments enumerated above. Suchelectronic devices may include one or more image capture devices, suchas optical image sensors/camera units; a display; a user interface; oneor more processors; and a memory coupled to the one or more processors.Instructions may be stored in the memory, the instructions causing theone or more processors to execute instructions in accordance with thevarious techniques disclosed herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates exemplary identified matching and mismatchingfeatures between aligned NIR and visible images of a captured scene,according to one or more embodiments.

FIG. 1B illustrates an exemplary registration of a visible image to anNIR image, based on the matching and mismatching features identified inFIG. 1A, according to one or more embodiments.

FIG. 2A illustrates exemplary identified matching features betweenaligned NIR and visible images of a captured scene, according to one ormore embodiments.

FIG. 2B illustrates an exemplary registration of a visible image to anNIR image, based on the matching features identified in FIG. 2A,according to one or more embodiments.

FIG. 3 illustrates an exemplary system for training and use of aML-enhanced feature matching model, according to one or moreembodiments.

FIG. 4 is a flow chart illustrating a method of training and use of aML-enhanced feature matching model, according to various embodiments.

FIG. 5 is a block diagram illustrating a programmable electroniccomputing device, in which one or more of the techniques disclosedherein may be implemented.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the inventions disclosed herein. It will be apparent,however, to one skilled in the art that the inventions may be practicedwithout these specific details. In other instances, structure anddevices are shown in block diagram form in order to avoid obscuring theinventions. References to numbers without subscripts or suffixes areunderstood to reference all instance of subscripts and suffixescorresponding to the referenced number. Moreover, the language used inthis disclosure has been principally selected for readability andinstructional purposes and may not have been selected to delineate orcircumscribe the inventive subject matter, and, thus, resort to theclaims may be necessary to determine such inventive subject matter.Reference in the specification to “one embodiment” or to “an embodiment”(or similar) means that a particular feature, structure, orcharacteristic described in connection with the embodiments is includedin at least one embodiment of one of the inventions, and multiplereferences to “one embodiment” or “an embodiment” should not beunderstood as necessarily all referring to the same embodiment.

As used herein, the terms “multispectral image” or “multispectral imagedata” refers to captured images or image data having a first set ofchannels comprising captured light information over a first range ofwavelengths (which may also be referred to herein as “primary channels”)and at least one additional channel comprising captured lightinformation over a second range of wavelengths (which may also bereferred to herein as “multispectral channels”), wherein the first andsecond ranges of wavelengths may be non-contiguous (or at leastpartially non-overlapping). In some examples, the multispectral channelof multispectral image data may predominantly measure light informationin a non-visible wavelength range (e.g., infrared or ultra-violetranges), while the primary channels of the multispectral image data maypredominantly measure light information in a visible wavelength range(e.g., red, green, and/or blue channels). The availability ofmultispectral image data comprising both visible light information andcorresponding non-visible light information from a captured scene may beused to improve or enhance the quality of a captured image. Thetechniques described herein may be used with active illumination invisible, non-visible, or a combination of visible and non-visiblewavelength ranges. When the active illumination is in the non-visiblewavelength ranges only, such active illumination may be used in lowlighting conditions, which can improve image capture performance, evenin instances when a user does not want a visible or perceptible cameraflash.

In other examples, one or more of the channel(s) of a multispectralimage or multispectral image data may comprise a “mixed channel,” i.e.,a channel that measures light in both visible and non-visible wavelengthranges, as in the case of some RGB-W sensors, e.g., when used without anIR-cut filter. In that case, the response of the red channel istechnically a mix of Red+IR, the green channel a mix of Green+IR, theblue channel a mix of Blue+IR, and the white channel a mix ofRed+Green+Blue+IR. By solving this linear system of equations, theresponse to the IR signal in the captured scene may be isolated andtreated as separate a multispectral channel, if desired.

In some cases, however, it may be preferable to use a camera with adedicated, single aperture, multispectral RGB-IR image sensor that iscapable of creating aligned (i.e., pre-registered) images, wherein eachpixel contains information in each spectra (e.g., a camera that createsfour channel RGB+IR images). Preferably, the images created in eachspectrum are of comparable quality (e.g., in terms of sharpness,contrast, color reproduction, and/or resolution) to each other—and toimages that may be captured in a given spectrum by a typical dedicatedsingle-spectrum image sensor. Other techniques and image sensor typesmay also be used to capture and generate aligned multispectral imagedata, e.g., cameras with a filter pattern such as a 2×2 RGB+IR Bayerpattern that may be used to produce low-resolution multispectral imagesof a captured scene, low resolution image spectrometers that providefiner spectral sampling per pixel, or cameras with a filter wheel placedin front of the lens that is used to capture a sequence of images eachthrough a different filter to obtain a series of full resolution images,each within a different spectrum, etc. However, each of these othertechniques and image sensor types may present unique challenges toovercome, in order to obtain large enough quantities of reliablyhigh-quality multispectral image data, as may be needed to train thevarious ML models that will be discussed herein.

As used in the various examples described in detail herein, the term“multispectral image” or “multispectral image data” will refer to imagedata that includes at least some IR signal data (unless otherwisenoted), although it is to be understood that, in other examples,multispectral image data need not necessarily include IR signal dataspecifically or non-visible data generally, and may instead includecaptured light information over any two or more non-contiguous (or atleast partially non-overlapping) wavelength ranges.

MULTISPECTRAL IMAGE EXAMPLES

One of the reasons for using multispectral image data when capturingimages of a scene is that additional information can be gathered beyondthat which is evident in the visible spectrum. One example of thisphenomenon is evidenced by the fact that capturing an image of a photoof a human face (e.g., on a sheet of paper tacked to a bulletin board)is easily discernible from an image of an actual human face when lookingin the Near IR (NIR) spectrum, but it may have very similar appearanceto an image of an actual human's face when looking in the visible (e.g.,RGB) spectrum. Another example of this phenomenon might be evidenced ina situation where a car that has recently been parked among others in aparking lot is easily discernible in the Long Wavelength IR (LWIR)spectrum from the other cars in the parking lot, e.g., due to its hotengine, however, neither its color scheme nor license plate can beidentified in LWIR. Thus, it may be beneficial to register the LWIRimage with a visible spectrum image of the same scene, e.g., in order tofind out the color of the recently-parked car and/or what its licenseplate number is.

The examples above help illustrate that the different appearance ofobjects in different spectra allows for more information to be gatheredby using multispectral image data, which, in turn, can facilitate betterdecisions being made about the images and/or allow for enhanced imageprocessing. However, because objects do look different in differentspectra, the corresponding feature descriptors calculated for such imagefeatures in each spectrum will also likely be very different. Forinstance, an image of the back of a car could have many corner features(e.g., in the text of the license plate) in the visible spectrum thatare completely absent in the LWIR spectrum. Similarly, a local maximumfeature that the LWIR detects due to the red hot exhaust of the car willnot be present at all in the visible spectrum. Some features, however,such as the edges of the car, might be common to both the LWIR andvisible spectrum images. However, since the image content around thesefeatures will be different in each spectral image, their respectivefeature descriptors could vary greatly. Thus, even if it was known whichfeatures were common between the images, because their descriptors wouldlikely be so different, it would likely not be possible to reliablymatch them correctly.

So, not only are fewer correct feature matches likely to be foundbetween images of different spectra, there are normally more incorrectmatches, too. This can be seen, e.g., in FIG. 1A, which has a NIR image100 on top, the corresponding visible image 110 at the bottom, and linesjoining exemplary identified matching feature pairs (e.g., 1 ^(IR)-1^(RGB), 2 ^(IR)-2 ^(RGB), 3 ^(IR)-3 ^(RGB), and 4 ^(IR)-4 ^(RGB)). Inthis example, the images 100/110 have been aligned for ease ofillustration purposes, so any drawn lines which are not vertical (e.g.,the lines for features 2 and 4) are necessarily representative ofincorrect matches that have been identified by the a feature matchingalgorithm, e.g., using ORB features or any other desired featuredetector algorithm.

Turning now to FIG. 1B, an exemplary registration 150 of a visible RGBimage 154 (corresponding to image 110 of FIG. 1A) to an NIR image 152(corresponding to image 100 of FIG. 1A) is shown, based on the matchingand mismatching features identified in FIG. 1A (including exemplaryfeature pairs 1-4, discussed above). As is illustrated in exemplaryregistration 150, the use of a large number of mismatched features(i.e., represented by all the non-vertical feature lines in FIG. 1A) hascaused the registration to fail, as evidenced by visible image box 154being both skewed and shrunk in the center of NIR image box 152, due tothe aforementioned large number of mismatching features that wereidentified. [In this example, the mapping equation used was the 5-degreeof freedom least squares best fit, with values for: X scale, Y scale, Xoffset, Y offset, and in-plane rotation. Though, as mentioned above, anydesired mapping equation may be used to suit a given implementation.]

Turning now to FIG. 2A, exemplary identified matching features, e.g.,matching features as identified using a trained ML model such as thosedescribed herein, are shown between an aligned NIR image 200 and visibleimage 210 of a captured scene, according to one or more embodiments. Asin FIG. 1A, lines are shown joining exemplary identified matchingfeature pairs (e.g., 5 ^(IR)-5 ^(RGB), 6 ^(IR)-6 ^(RGB), 7 ^(IR)-7^(RGB), and 8 ^(IR)-8 ^(RGB)) between NIR image 200 on top, thecorresponding visible image 210 at the bottom. In this example, too, theimages 200/210 have already been aligned for ease of illustrativepurposes, so any drawn lines that are not vertical are againrepresentative of incorrect matches that have been identified. However,as will be explained in further detail below, by leveraging a trained MLmodel for feature matching, the matching features identified in FIG. 2Aare all connected with vertical lines, indicating that they reflectcorrect feature matches between image 200 and image 210.

Turning now to FIG. 2B, an exemplary registration 250 of a visible RGBimage 254 (corresponding to image 210 of FIG. 2A) to an NIR image 252(corresponding to image 200 of FIG. 2A) is shown, based on the matchingfeatures identified in FIG. 2A (including exemplary feature pairs 5-8,discussed above). As is illustrated in exemplary registration 250, theuse a large number of properly matched features (i.e., represented byall the vertical feature lines in FIG. 2A) has caused the registrationto be successful, as evidenced by visible image box 254 being almostperfectly aligned with the NIR image box 252, after the determinedmapping function has been applied.

Exemplary System for Training and Use of ML-Enhanced Feature MatchingModel

Turning now to FIG. 3, an exemplary system 300 for training and use of aML-enhanced feature matching model is shown, according to one or moreembodiments. The components above the dashed line in FIG. 3 representthe training phase for the feature matching model, while the componentsbelow the dashed line in FIG. 3 represent the use of the trained featurematching model at inference time. Beginning with block 302, a trainingset of aligned multispectral image data is obtained. As described above,the aligned multispectral image data may, e.g., comprise a firstplurality of aligned multispectral images, wherein each alignedmultispectral image comprises at least a first portion of image data ina first spectrum and at least a second portion of image data in a secondspectrum, wherein the first spectrum and second spectrum are in (atleast partially) non-overlapping wavelength ranges, and wherein thefirst portion and second portion are aligned. In some embodiments, thefirst plurality of aligned multispectral images may comprise sets ofimages having aligned visible image data (e.g., RGB data) andnon-visible image data (e.g., IR data or NIR data, specifically). Insome embodiments, the first plurality of aligned multispectral imagescomprise images captured by the same image capture device of anelectronic device, such as an RGB-IR sensor.

Next, at block 304, the system may generate a first plurality of featuredescriptors for features identified at various positions acrosscorresponding multispectral images. For example, a feature located atcoordinates (X, Y) in a visible image of a captured scene may be said tomatch with a feature located at corresponding coordinates (X, Y) (orwithin some acceptable threshold distance from coordinate (X,Y), e.g.,within 2 pixels in any direction) in a corresponding non-visible imageof the captured scene. As mentioned above, in some embodiments, afeature may comprise both a coordinate position, as well as a featuredescriptor. Any desired feature descriptor may be employed in a givenimplementation, e.g., the aforementioned Scale Invariant Feature Tracker(SIFT) feature, the Speeded-up Robust Features (SURF) feature, theOrientated FAST and Rotated BRIEF (ORB) feature, the Binary RobustInvariant Scale Keypoints (BRISK) feature, the Binary Features fromRobust Orientation Segment Tests (BFROST) feature, or many others.Because of the vast potential differences in multispectral data valuesobtained for the same portion of a scene (e.g., as described in the carparking lot example above), the feature descriptors for a given pair ofcorresponding features in multispectral image data may actually be quitedifferent from each other, which may cause a traditional imageregistration to conclude that the two feature descriptors do notactually reflect a matching feature. However, due to the aligned natureof the multispectral image data 302 being used to train the ML model inthis example, the respective coordinates of the corresponding featuresmay be used to confirm whether (and, in some cases, to what extent) twoidentified features are a good match—regardless of how different theirrespective feature descriptors may be. In some cases, a strength valuemay also be assigned to each feature descriptor pair, wherein thestrength value is based, at least in part, on a distance between thecoordinates of the identified corresponding features in their respectiveimages.

Turning now to a more detailed example, the first plurality of featuredescriptors generated at block 304 for the aligned multispectral imagedata may comprise: a first set of feature descriptors representingfeatures in a first portion of the aligned multispectral image data fromblock 302 that is in a first spectrum; and a second set of featuredescriptors representing features in a second portion of the alignedmultispectral image data from block 302 that is in a second spectrum. Insome cases, each of the feature descriptor pairs that will be includedin the generated training set at block 306 comprises a match (i.e., amatch of at least some strength) between a feature descriptor from thefirst set and a feature descriptor from the second set. As mentionedabove, the matches in some training scenarios may be determined based onthe coordinates of the locations of the respective features within theirrespective images—rather than the similarity between their respectivefeature descriptors. In some cases, each of the feature descriptor pairsin the generated training set at block 306 may further comprise astrength value, e.g., a strength value that is based, at least in part,on a distance between: a location within a given image of the firstplurality of aligned multispectral images of a respective featurerepresented in the first set of feature descriptors; and a locationwithin the given image of the respective feature represented in thesecond set of feature descriptors. For example, the closer that thecoordinates of the respective features for two feature descriptors in afeature descriptor pair are, the higher that feature descriptor pair'sstrength value may be set to. In some cases, it may be beneficial toinclude mismatched (or poorly-matched) feature descriptor pairs in thetraining set, i.e., those feature descriptor pairs that have arelatively lower strength value, such that the trained ML model can alsolearn examples of feature descriptor pairs that may not be indicative ofactual matching features in input multispectral image data. When featuredescriptor pairs that have similarity to the mismatched featuredescriptor pairs the model was trained on are then encountered by themodel at inference time, they can either not be reported as matchingfeatures or be reported as matching features with such low strengthvalues that they would not have a large influence on any subsequentimage registration operations.

Next, at block 308, a training set of matching feature descriptor pairsgenerated at block 306 may be used to train a ML model at block 308,wherein the resulting trained ML is configured to determine matchesbetween features in aligned multispectral data, as well as in unalignedmultispectral image data, e.g., so long as the unaligned multispectralimage data has spectral sensitivities profiles similar to the alignedmultispectral data upon which the model was trained. In some cases, itmay be advantageous to also provide the input image data 302 itself tothe ML model 308 (represented by the dashed line arrow between blocks302 and 308), so that the model could then use the ML to create a betterfeature descriptor from the input image data and, e.g., learn how tomatch those feature descriptors, rather than using the featuresdescriptors provided by SURF/SIFT/ORB/etc. In other words, by providingthe ML model with the images associated with matching features, themodel itself can determine how to best describe such features, so thatit can later match similar features that it may encounter at inferencetime (represented by the dashed line arrow between blocks 308 and 356).As may now be appreciated, even though the ML model was trainedleveraging the accurate feature location correspondences provided byusing aligned multispectral image data to learn what correspondingfeature descriptors in one spectrum look like in another spectrum, themodel may now also be advantageously employed to identify matchingfeature descriptors in image data sets across spectra that come frommultiple image data sources that are not pre-aligned or pre-registered,e.g., spatially-distinct cameras mounted at different locations on asingle electronic device. It will be appreciated that the machinelearning model could take many forms, including: neural network, geneticalgorithms, Support Vector Machines (SVM), Random Forest, K-Means,Matrix Factorization, expert systems, or any other machine learningapproaches.

Moving now below the dashed line to the inference time usage of thetrained feature matching model, image data, e.g., unalignedmultispectral image data (350) may be obtained. As described above, theunaligned multispectral data may comprise a plurality of images capturedby two or more spatially-distinct image capture devices that also havedifferent spectral sensitivity profiles, e.g., an image sensor sensitiveto the non-visible range of the spectrum that captures NIR image data352 that is mounted adjacently to an image sensor sensitive to thevisible range of the spectrum that captures RGB image data 354. Theimage pairs (or sets of greater than two corresponding images, if morethan two such spatially-distinct image capture devices are present in agiven electronic device) may be captured concurrently, or with someamount of temporal offset. The difference in spatial positioning betweenthe image capture devices and/or any temporal offsets in the imagescaptured by the image capture devices can each cause the correspondingfeatures in the captured images to be located at vastly different imagecoordinate positions in the respective images. Also, as describedearlier, because the image capture devices in this scenario havedifferent spectral sensitivity profiles, it is unlikely that the featuredescriptors for corresponding features will be similar enough to be usedwith traditional feature matching processes. As such, without the aid ofa trained ML model, such as that described above with reference to block308, performing image registration on unaligned multispectral image datamay prove quite difficult.

Moving now to block 356, feature descriptors may be generated for theunaligned multispectral image data 350 (e.g., for features identified inNIR image data 352 as well as features identified within RGB image data354). Then, at block 360, the trained ML model from block 308 may beused to determine matching features between the unaligned multispectralimage data 350, i.e., between NIR image data 352 and RGB image data 354,in this example. Finally, at block 370, the multispectral image data maybe registered based, at least in part, on the matching featuresidentified at block 360. Once registered, various desired imageprocessing operations may be performed on the now-aligned multispectralimage data, e.g., an image fusion operation, to generate an enhancedoutput image. In other examples, the now-aligned multispectral imagedata may be used to provide additional contextual information tosupplement the portions of the image captured in the visible spectrum,such as in the car parking lot example described above.

Exemplary Methods for Training and Use of ML-Enhanced Feature MatchingModel

Turning now to FIG. 4, a flow chart 400 illustrating a method oftraining and use of a ML-enhanced feature matching model is shown,according to various embodiments. First, at Step 402, the method 400 mayobtain aligned multispectral image data (e.g., aligned NIR and RGB imagedata, as has been discussed previously). Next, at Step 404, the method400 may generate feature descriptors for the aligned multispectral imagedata. Next, at Step 406, the method 400 may generate a training set ofmatching (and mismatching) feature descriptor pairs, optionallyincluding a strength value for each pair. As described above, accordingto some embodiments, the strength of a matching between a given pair offeatures may be based on the closeness of the coordinates within theirrespective images, as opposed to the similarity between the features'respective feature descriptors. Next, at Step 408, the method 400 maytrain a machine learning model using the generated training set offeature descriptor pairs, wherein the trained ML model is configured todetermine matches between features in unaligned multispectral imagedata.

If the model developed at Step 408 is to be applied at inference time,the method 400 may proceed to Step 410 and obtain unalignedmultispectral image data (e.g., again NIR and RGB image data, butperhaps from different camera units or apertures that arespatially-distinct from one another, thus providing no 1:1 “groundtruth” correspondence of a given identified feature's location withineach image). Next, at Step 412, the method 400 may generate featuredescriptors for the unaligned multispectral image data. At Step 414, themethod 400 may use the trained ML model to identify matching features inthe obtained unaligned multispectral image data (i.e., based on thefeature descriptors generated at Step 412) and register the unalignedmultispectral image data using the identified matching features. Asmentioned above, at Step 416, once the images are registered, variousdesired image processing operations may be performed on the registeredmultispectral image data, e.g., an image fusion operation, to generatean enhanced output image. (Note: The optionality of the performance ofinference time Steps 410 through 416 is indicated by the use of dashedline boxes in FIG. 4.)

Exemplary Electronic Computing Devices

Referring now to FIG. 5, a simplified functional block diagram ofillustrative programmable electronic computing device 500 is shownaccording to one embodiment. Electronic device 500 could be, forexample, a mobile telephone, personal media device, portable camera, ora tablet, notebook or desktop computer system. As shown, electronicdevice 500 may include processor 505, display 510, user interface 515,graphics hardware 520, device sensors 525 (e.g., proximitysensor/ambient light sensor, accelerometer, inertial measurement unit,and/or gyroscope), microphone 530, audio codec(s) 535, speaker(s) 540,communications circuitry 545, image capture device 550, which may, e.g.,comprise multiple camera units/optical image sensors having differentcharacteristics or abilities (e.g., Still Image Stabilization (SIS),HDR, OIS systems, optical zoom, digital zoom, etc.), video codec(s) 555,memory 560, storage 565, and communications bus 570.

Processor 505 may execute instructions necessary to carry out or controlthe operation of many functions performed by electronic device 500(e.g., such as the generation and/or processing of images in accordancewith the various embodiments described herein). Processor 505 may, forinstance, drive display 510 and receive user input from user interface515. User interface 515 can take a variety of forms, such as a button,keypad, dial, a click wheel, keyboard, display screen and/or a touchscreen. User interface 515 could, for example, be the conduit throughwhich a user may view a captured video stream and/or indicate particularimage frame(s) that the user would like to capture (e.g., by clicking ona physical or virtual button at the moment the desired image frame isbeing displayed on the device's display screen). In one embodiment,display 510 may display a video stream as it is captured while processor505 and/or graphics hardware 520 and/or image capture circuitrycontemporaneously generate and store the video stream in memory 560and/or storage 565. Processor 505 may be a system-on-chip (SOC) such asthose found in mobile devices and include one or more dedicated graphicsprocessing units (GPUs). Processor 505 may be based on reducedinstruction-set computer (RISC) or complex instruction-set computer(CISC) architectures or any other suitable architecture and may includeone or more processing cores. Graphics hardware 520 may be specialpurpose computational hardware for processing graphics and/or assistingprocessor 505 perform computational tasks. In one embodiment, graphicshardware 520 may include one or more programmable graphics processingunits (GPUs) and/or one or more specialized SOCs, e.g., an SOC speciallydesigned to implement neural network and machine learning operations(e.g., convolutions) in a more energy-efficient manner than either themain device central processing unit (CPU) or a typical GPU, such asApple's Neural Engine processing cores.

Image capture device 550 may comprise one or more camera unitsconfigured to capture images, e.g., images which may be processed togenerate noise-reduced versions of said captured images, e.g., inaccordance with this disclosure. Output from image capture device 550may be processed, at least in part, by video codec(s) 555 and/orprocessor 505 and/or graphics hardware 520, and/or a dedicated imageprocessing unit or image signal processor incorporated within imagecapture device 550. Images so captured may be stored in memory 560and/or storage 565. Memory 560 may include one or more different typesof media used by processor 505, graphics hardware 520, and image capturedevice 550 to perform device functions. For example, memory 560 mayinclude memory cache, read-only memory (ROM), and/or random accessmemory (RAM). Storage 565 may store media (e.g., audio, image and videofiles), computer program instructions or software, preferenceinformation, device profile information, and any other suitable data.Storage 565 may include one more non-transitory storage mediumsincluding, for example, magnetic disks (fixed, floppy, and removable)and tape, optical media such as CD-ROMs and digital video disks (DVDs),and semiconductor memory devices such as Electrically ProgrammableRead-Only Memory (EPROM), and Electrically Erasable ProgrammableRead-Only Memory (EEPROM). Memory 560 and storage 565 may be used toretain computer program instructions or code organized into one or moremodules and written in any desired computer programming language. Whenexecuted by, for example, processor 505, such computer program code mayimplement one or more of the methods or processes described herein.Power source 575 may comprise a rechargeable battery (e.g., alithium-ion battery, or the like) or other electrical connection to apower supply, e.g., to a mains power source, that is used to manageand/or provide electrical power to the electronic components andassociated circuitry of electronic device 500.

It is to be understood that the above description is intended to beillustrative, and not restrictive. For example, the above-describedembodiments may be used in combination with each other. Many otherembodiments will be apparent to those of skill in the art upon reviewingthe above description. The scope of the invention therefore should bedetermined with reference to the appended claims, along with the fullscope of equivalents to which such claims are entitled.

What is claimed is:
 1. A device, comprising: a memory; and one or moreprocessors operatively coupled to the memory, wherein the one or moreprocessors are configured to execute instructions causing the one ormore processors to: obtain aligned multispectral image data; generate afirst plurality of feature descriptors for features identified in thealigned multispectral image data; generate a training set of featuredescriptor pairs based on the generated first plurality of featuredescriptors; and train a machine learning (ML) model based on thegenerated training set of feature descriptor pairs, wherein the trainedML model is configured to determine matches between features inunaligned multispectral image data.
 2. The device of claim 1, whereinthe aligned multispectral image data comprises a first plurality ofaligned multispectral images.
 3. The device of claim 2, wherein eachaligned multispectral image comprises at least a first portion of imagedata in a first spectrum and at least a second portion of image data ina second spectrum, wherein the first spectrum and second spectrum are inat least partially non-overlapping wavelength ranges, and wherein thefirst portion and second portion are aligned.
 4. The device of claim 2,wherein the first plurality of aligned multispectral images comprisessets of images having aligned visible image data and non-visible imagedata.
 5. The device of claim 4, wherein the visible image data comprisesRGB image data, and wherein the non-visible image data comprisesinfrared (IR) image data.
 6. The device of claim 2, wherein the firstplurality of aligned multispectral images comprise images captured by afirst image capture device.
 7. The device of claim 3, wherein the firstplurality of feature descriptors for the aligned multispectral imagedata comprises: a first set of feature descriptors representing featuresin the first portion of image data in the first spectrum; and a secondset of feature descriptors representing features in the second portionof image data in the second spectrum.
 8. The device of claim 7, whereinat least one of the feature descriptor pairs in the generated trainingset comprises a match between a feature represented in the first set offeature descriptors and a feature represented in the second set offeature descriptors.
 9. The device of claim 8, wherein each of thefeature descriptor pairs in the generated training set further comprisesa strength value.
 10. The device of claim 9, wherein the strength valueof each feature descriptor pair is based, at least in part, on adistance between: a location within a given image of the first pluralityof aligned multispectral images of a respective feature represented inthe first set of feature descriptors; and a location within the givenimage of the respective feature represented in the second set of featuredescriptors.
 11. The device of claim 1, wherein at least one of thefirst plurality of feature descriptors comprises a description of: aScale Invariant Feature Tracker (SIFT) feature, a Speeded-up RobustFeatures (SURF) feature, an Orientated FAST and Rotated BRIEF (ORB)feature, a Binary Robust Invariant Scale Keypoints (BRISK) feature, or aBinary Features from Robust Orientation Segment Tests (BFROST) feature.12. The device of claim 1, wherein the one or more processors arefurther configured to execute instructions causing the one or moreprocessors to: obtain unaligned multispectral image data; generate asecond plurality of feature descriptors for features identified in theunaligned multispectral image data; and use the trained ML model todetermine matches between features represented by the second pluralityof feature descriptors for the unaligned multispectral image data. 13.The device of claim 12, wherein the one or more processors are furtherconfigured to execute instructions causing the one or more processorsto: perform an image registration operation on the unalignedmultispectral image data based, at least in part, on the determinedmatches between features represented by the second plurality of featuredescriptors for the unaligned multispectral image data, to generatesecond aligned multispectral image data.
 14. The device of claim 13,wherein the one or more processors are further configured to executeinstructions causing the one or more processors to: perform a fusionoperation on the second aligned multispectral image data to generate anenhanced output image.
 15. The device of claim 12, wherein the unalignedmultispectral image data comprises one or more images comprising imagedata captured by two or more image capture devices.
 16. A non-transitorycomputer readable medium comprising computer readable instructionsexecutable by one or more processors to: obtain aligned multispectralimage data; generate a first plurality of feature descriptors forfeatures identified in the aligned multispectral image data; generate atraining set of feature descriptor pairs based on the generated firstplurality of feature descriptors; and train a machine learning (ML)model based on the generated training set of feature descriptor pairs,wherein the trained ML model is configured to determine matches betweenfeatures in unaligned multispectral image data.
 17. The non-transitorycomputer readable medium of claim 16, wherein the aligned multispectralimage data comprises a first plurality of aligned multispectral images.18. The non-transitory computer readable medium of claim 17, whereineach aligned multispectral image comprises at least a first portion ofimage data in a first spectrum and at least a second portion of imagedata in a second spectrum, wherein the first spectrum and secondspectrum are in at least partially non-overlapping wavelength ranges,and wherein the first portion and second portion are aligned.
 19. Thenon-transitory computer readable medium of claim 18, wherein the firstplurality of feature descriptors for the aligned multispectral imagedata comprises: a first set of feature descriptors representing featuresin the first portion of image data in the first spectrum; and a secondset of feature descriptors representing features in the second portionof image data in the second spectrum.
 20. The non-transitory computerreadable medium of claim 19, wherein at least one of the featuredescriptor pairs in the generated training set comprises a match betweena feature represented in the first set of feature descriptors and afeature represented in the second set of feature descriptors.
 21. Thenon-transitory computer readable medium of claim 20, wherein each of thefeature descriptor pairs in the generated training set further comprisesa strength value.
 22. An image processing method, comprising: obtainingaligned multispectral image data; generating a first plurality offeature descriptors for features identified in the aligned multispectralimage data; generating a training set of feature descriptor pairs basedon the generated first plurality of feature descriptors; and training amachine learning (ML) model based on the generated training set offeature descriptor pairs, wherein the trained ML model is configured todetermine matches between features in unaligned multispectral imagedata.
 23. The method of claim 22, further comprising: obtainingunaligned multispectral image data; generating a second plurality offeature descriptors for features identified in the unalignedmultispectral image data; and using the trained ML model to determinematches between features represented by the second plurality of featuredescriptors for the unaligned multispectral image data.
 24. The methodof claim 23, further comprising: performing an image registrationoperation on the unaligned multispectral image data based, at least inpart, on the determined matches between features represented by thesecond plurality of feature descriptors for the unaligned multispectralimage data, to generate second aligned multispectral image data.
 25. Themethod of claim 24, further comprising: performing a fusion operation onthe second aligned multispectral image data to generate an enhancedoutput image.